Learning about Machine Learning: An Extended Assignment to Classify Twitter Accounts

نویسندگان

  • Eni Mustafaraj
  • Scott D. Anderson
چکیده

We describe a four-week series of assignments in an undergraduate AI course at a liberal arts college developing a supervised learning solution to the problem of classifying Twitter accounts as either a person account or a non-person account (e.g. organization or spambot). This problem employs real data in an ongoing research project by the first author, yet is accessible to students with limited programming expertise. The students were able to experience a complete cycle of creating a machine learning solution: exploring raw data, creating a training set, engineering features, comparing different classifiers, evaluating the results, and performing error analysis. We received positive feedback from the students and intend to refine the assignment and make it available (together with the created training data) for use by the research community.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data

Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...

متن کامل

Finding Sensitive Accounts on Twitter: An Automated Approach Based on Follower Anonymity

We explore the feasibility of automatically finding accounts that publish sensitive content on Twitter, by examining the percentage of anonymous and identifiable followers the accounts have. We first designed a machine learning classifier to automatically determine if a Twitter account is anonymous or identifiable. We then classified an account as potentially sensitive based on the percentages ...

متن کامل

Exploring Twitter Hashtags

Twitter messages often contain so-called hashtags to denote keywords related to them. Using a dataset of 29 million messages, I explore relations among these hashtags with respect to co-occurrences. Furthermore, I present an attempt to classify hashtags into five intuitive classes, using a machine-learning approach. The overall outcome is an interactive Web application to explore Twitter hashtags.

متن کامل

Mining Anonymity: Identifying Sensitive Accounts on Twitter

We explore the feasibility of automatically finding accounts that publish sensitive content on Twitter. One natural approach to this problem is to first create a list of sensitive keywords, and then identify Twitter accounts that use these words in their tweets. But such an approach may overlook sensitive accounts that are not covered by the subjective choice of keywords. In this paper, we inst...

متن کامل

Fame for sale: efficient detection of fake Twitter followers

Fake followers are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere—hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011